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Author's Abstract 



Standard mathematical notation works well for short formulas, but not for 
the longer ones often written by computer scientists. Notations are proposed 
to make one or two-page formulas easier to read and reason about. 



Introduction 

Mathematicians seldom write formulas longer than a dozen or so lines. Com- 
puter scientists often write much longer formulas. For example, an invariant 
of a concurrent algorithm can occupy more than a page, and the specifica- 
tion of a real system can be a formula dozens or even hundreds of pages 
long. Standard mathematical notation works well for short formulas, but 
not for long ones. I propose a few simple notations for writing formulas of 
up to a couple of pages. These notations can make formulas much easier to 
read and reason about. 

Formulas significantly longer than two pages require hierarchical struc- 
turing. Methods for structuring long programs can be used to structure 
long formulas. Programs of less than a dozen or so pages can be adequately 
structured with procedures; longer programs require some method of group- 
ing procedures into modules. The definition is the mathematical analog of 
the procedure. Definitions suffice for structuring formulas of up to about 
a dozen pages. For longer formulas, some form of module structure is also 
needed. 

Any formula can be written with a hierarchy of definitions, each only a 
few lines long. However, just as programs become hard to read if broken into 
too many procedures, formulas are hard to read if broken into definitions 
that are too small. In my experience, the best way to structure a long 
formula is in terms of individual formulas of up to a page or two. 

Writing Formulas 

Consider the following definition, written with standard mathematical con- 
ventions. (The examples come from the invariant of an unpublished cor- 
rectness proof for a cache coherence algorithm; the reader is not expected 
to understand them.) 



This definition is easy to read because it is short. However, suppose that 
"None" and max(Locs) were replaced by much longer expressions. We would 
then see that the "where" construct is bad because it forces us to read the 




if Locs = 0 
otherwise 



where Locs 



{i e {0 ... \memQ\ - 1} : 

{memQ[i].req.type = "Write") 
A {memQ[i].req.a.dr = a) } 
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entire definition of memQLoc{a) before we learn what Locs is. A structure 
that scales better to large formulas is 



Suppose once again that "None" were replaced by a long expression e, per- 
haps crossing onto the next page. The typographic difficulties posed by 
the resulting large left brace are daunting. Simply removing the brace still 
leaves us with the problem of where to put the condition Locs = 0. If it 
goes after e, we have to read several lines before discovering the structure 
of the definition. If it goes at the end of the first line, we read the Locs = 0 
in the middle of reading e. A better notation is the if/then/else construct 
used in programming languages. 



let Locs = {i G {0 . . . \memQ\ — 1} : 

{memQ[i].req.type = "Write") 
A {memQ[i].req.a.dr = a) } 
in memQLoc{a) = if Locs = 0 then "None" 

else max(Locs) 



The if/then/else makes the structure immediately clear, even for long for- 
mulas. The obvious analog of the case construct of programming languages 
works for definitions with more than two alternatives. The customary closing 
end (or fi) is unnecessary, because we can use parentheses and indentation 
to delimit the scope of an if or case. 

The original version of the definition had an important feature that has 
been lost in these transformations: we could see at once that it was a defi- 
nition of memQLoc{a) . One further change recovers this feature. 

memQLoc{a) = let Locs = {i G {0 . . . \memQ\ — 1} : 

{memQ[i].req.type = "Write") 
A {memQ[i].req.a.dr = a) } 
in if Locs = 0 then "None" 

else max(Locs) 

The basic problem with the "if . . . otherwise" construct is shared by all 
infix operators: we discover the high-level structure only after reading to 
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the end of the first argument. Consider the following formula. 

(Vc G CacheAddress : 

ca.che[p, c] G ([adr : Address, val : Value] U {"Invalid"})) 
A {{request[p] G Request) 

V {{requestlp] = "Ready") A {state[p] = "Idle"))) 
A {response[p] G Value) 

We have to read to the end of the second line, and count parentheses, before 
learning that the formula is a conjunction. One possible solution is prefix 
notation, writing A(A, _B, C) instead of A A B AC. 

A (Vc G CacheAddress : 

cache[p, c] G ([adr : Address, vai : Value] U {"Invalid"}), 
V {request[p] G Request, 
A {request[p] = "Ready", 
state[p] = "Idle")), 
resporise[p] G Value) 

This formula is easy to read only because of the way it is indented. If one 
needs indentation anyway, why not use it to eliminate the parentheses and 
commas required by a prefix notation? We write the formula Ai AA2A. . .AA„ 
as the aligned list 

A Ai 
A A2 



and write disjunctions similarly. We can then use indentation to eliminate 
parentheses, writing the formula above as 

A Vc G CacheAddress : 

cache[p, c] G ([adr : Address, vai : Value] U {"Invalid"}) 
A V request[p] G Request 
V A requestlp] = "Ready" 
A state[p] = "Idle" 
A resporise[p] G Value 

We continue to use A and V as infix operators in subformulas. For example, 
the second conjunct of this formula can also be written 

A V requestlp] ^ Request 

V (request[p] = "Ready") A (state[p] = "Idle") 
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The list convention for conjunction and disjunction can be used for other 
associative operators, including addition and multiplication. However, it 
does not work for the nonassociative boolean operator =^ (implies). I have 
not found a good general method of writing A B when A and B are long 
formulas. When A and B are conjunctions or disjunctions, the format 

A Ai 

A Am 
^ /\Bi 



works fairly well if Ai A ... A A^ is only a few lines long. 

Writing conjunctions and disjunctions as lists lets us take full advantage 
of indentation to eliminate parentheses. Indentation has meaning; shifting 
an expression to the left or right changes the way a formula is parsed. It is 
not hard to devise precise rules for parsing these two-dimensional formulas. 
However, there is some question about what formulas should be allowed. 
For example, should it be legal to write {Ai V A2) A B as follows? 

V Ai 

V A2 
A B 

Answers to these questions will evolve as people use the notation. 
Numbering Parts of Formulas 

We don't just write formulas, we also reason about them. Reasoning about 
a large formula requires a convenient way of referring to its components. 
With the list convention, we can name individual conjuncts and disjuncts by 
numbering them. The ith conjunct or disjunct of a formula named F is called 
F.i. A universally quantified formula can be viewed as a conjunction, where 
the yth conjunct of Va; : Q is Q[y/x], the formula obtained by substituting y 
for X m Q. If F is the name of the formula \f x : Q, then we take F{y) to be 
the name of the formula Q[y/x]. A similar convention applies to existential 
quantification. 

Figure 1 illustrates the use of these structuring and naming conven- 
tions in a real example — the definition of an invariant / for a cache co- 
herence algorithm. For simplicity, only the outermost three levels of con- 
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/ = let cacheLocs{p, a) = {c G Cache Address : A cache[p,c\ ^ "Invalid" 

A cache[p, c\.adr = a } 

inCache{p,a) = cacheLocs[p,a) ^ ^ 

memQLoc{a) = let Locs = {i G {0 . . . |memQ| — 1} : 

A memQ[i].req.type = "Write" 
A memQ[i].req.adr = a } 
in if Locs = 0 then "None" 

else max(Locs) 
memVal{a) = if memQLoc{a) = "None" 
then mainMemory[a] 
else memQ[memQLoc{a)].req.val 
in l.A Vp G Process : 

1. A Va e Address : 

1. A ^cacheLocs[p, a) < 1 

2. A inCache{p, a) (cacieVa7(p, a) = memVa7(a)) 

3. A maiiiMemory[a] G Value 

2. A Vc e CacieAddress : 

cache[p, c] G ([adr : Address, val : ValueJ U {"Invalid"}) 

3. A a.V request [p] G Request 

b.V A request [p] = "Ready" 
A state [p] = "Idle" 

4. A response[p] G Va7ue 

5. A l.A state\p] £ {"RdCaclie", "Mem Wait", 

"Bus Wait", "WrDone", "Idle"} 

2. A (state [p] = "RdCache") A request [p]. type = "Read" 

A inCache{p, request [p]. adr) 

3. A (state [p] = "Mem Wait") 

A -linCachelp, request [p]. adr) 

A #{i e {0 . . . ImemQI - 1} : 
A p = memQ[i].proc 
A memQ[i].req .type = "Read"} = 1 

4. A (state [p] = "Bus Wait") A {request\p\.type = "Read") 

-iinCacie(p, request [p]. adr) 

5. A {state\p\ = "WrDone") {request\p\.type = "Write") 

2. A memQ G SequenceOf ([proc : Process, req : Request]) 

3. AVi e {0... ImemQI - 1} : 

memQ[i].req.type = "Read" 

1. A state[memQ[i].proc] = "Mem Wait" 

2. A request[memQ[i].proc] = memQ[i].req 

Figure 1: An invariant of a cache coherence algorithm. 
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juncts and disjuncts are labeled. (I like to label conjuncts with num- 
bers and disjuncts with letters.) The naming convention implies that 1 .2 
is the formula memQ G SequenceOf {. . .) , and /.1(g). 3. a is the formula 
request[q] G Request. 

Conclusion 

The notations introduced here will be unfamiliar to most readers, and un- 
familiar notation usually seems unnatural. I have used the notations for 
several years, and I now find them indispensable. I urge the reader to 
rewrite formula / of Figure 1 in conventional notation and compare it with 
the original. Having to keep track of six or seven levels of parentheses reveals 
the advantage of using indentation to eliminate parentheses. 
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